Fresno County
CoddLLM: Empowering Large Language Models for Data Analytics
Zhang, Jiani, Zhang, Hengrui, Chakravarti, Rishav, Hu, Yiqun, Ng, Patrick, Katsifodimos, Asterios, Rangwala, Huzefa, Karypis, George, Halevy, Alon
Large Language Models (LLMs) have the potential to revolutionize data analytics by simplifying tasks such as data discovery and SQL query synthesis through natural language interactions. This work serves as a pivotal first step toward the development of foundation models explicitly designed for data analytics applications. To propel this vision forward, we unveil a new data recipe for post-training LLMs, enhancing their comprehension of data management and empowering them to tackle complex real-world analytics tasks. Specifically, our innovative approach includes a scalable synthetic data generation method that enables the creation of a broad spectrum of topics centered on data representation and manipulation. Furthermore, we introduce two new tasks that seamlessly bridge tables and text. We show that such tasks can enhance models' understanding of schema creation and the nuanced translation between natural language and tabular data. Leveraging this data recipe, we post-train a new foundation model, named CoddLLM, based on Mistral-NeMo-12B. To assess the language understanding and reasoning capabilities of LLMs in the realm of data analytics, we contribute AnalyticsMMLU, a benchmark containing thousands of multiple-choice questions on databases, data analysis, and machine learning. Our focus on data discovery, has resulted in the contribution of three comprehensive benchmarks that address both database and data lake scenarios. CoddLLM not only excels in performance but also sets a new standard, achieving the highest average accuracy across eight datasets. It outperforms GPT-3.5-Turbo on AnalyticsMMLU, exceeding GPT-4o by 12.1% in table selection and showing an average improvement of 24.9% in Text-to-SQL compared to the base model.
ACLU highlights the rise of AI-generated police reports -- what could go wrong?
The American Civil Liberties Association (ACLU) is sounding a warning about the use of AI in creating police reports, saying the tech could produce errors that affect evidence and court cases. The nonprofit highlighted the dangers of the tech in a white paper, following news that police departments in California are using a program called Draft One from Axon to transcribe body camera recording and create a first draft of police reports. One police department in Fresno said that it's using Draft One under a pilot program, but only for misdemeanor reports. "It's nothing more than a template," deputy chief Rob Beckwith told Industry Insider. "It's not designed to have an officer push a button and generate a report." He said that the department has seen any errors with transcriptions and that it consulted with the Fresno County DA's office in training the force, However, the ACLU noted four issues with the use of AI.
Fake blood and gunfire? A California lawmaker wants to create rules for shooter drills
At a Fresno County elementary school, a masked man with a fake gun carried out an active-shooter drill without most of the teachers and parents being informed ahead of time. At San Marino High School, police officers planned to fire blanks to mimic the sound of gunfire, but the drill was ultimately canceled over concerns of traumatizing students. More recently, a principal at a San Gabriel elementary school was placed on a leave of absence after allegedly using her fingers to mime holding a gun and pretending to shoot kids, telling them, "Boom. The rise in active-shooter drills at American schools has coincided with the growing phenomenon of mass shootings in the U.S., as well as media coverage focused on school massacres including Columbine, Sandy Hook and Uvalde. These drills have taken place at 95% of U.S. public schools as of the 2015-16 school year, according to the Education Department's National Center for Education statistics.
Traffic estimation in unobserved network locations using data-driven macroscopic models
This paper leverages macroscopic models and multi-source spatiotemporal data collected from automatic traffic counters and probe vehicles to accurately estimate traffic flow and travel time in links where these measurements are unavailable. This problem is critical in transportation planning applications where the sensor coverage is low and the planned interventions have network-wide impacts. The proposed model, named the Macroscopic Traffic Estimator (MaTE), can perform network-wide estimations of traffic flow and travel time only using the set of observed measurements of these quantities. Because MaTE is grounded in macroscopic flow theory, all parameters and variables are interpretable. The estimated traffic flow satisfies fundamental flow conservation constraints and exhibits an increasing monotonic relationship with the estimated travel time. Using logit-based stochastic traffic assignment as the principle for routing flow behavior makes the model fully differentiable with respect to the model parameters. This property facilitates the application of computational graphs to learn parameters from vast amounts of spatiotemporal data. We also integrate neural networks and polynomial kernel functions to capture link flow interactions and enrich the mapping of traffic flows into travel times. MaTE also adds a destination choice model and a trip generation model that uses historical data on the number of trips generated by location. Experiments on synthetic data show that the model can accurately estimate travel time and traffic flow in out-of-sample links. Results obtained using real-world multi-source data from a large-scale transportation network suggest that MaTE outperforms data-driven benchmarks, especially in travel time estimation. The estimated parameters of MaTE are also informative about the hourly change in travel demand and supply characteristics of the transportation network.
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
Trienes, Jan, Joseph, Sebastian, Schlötterer, Jörg, Seifert, Christin, Lo, Kyle, Xu, Wei, Wallace, Byron C., Li, Junyi Jessy
Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their knowledge of a text. We conduct a range of experiments with this framework. First, we collect a dataset of 1,000 linguist-curated QA pairs derived from 104 LLM simplifications of scientific abstracts of medical studies. Our analyses of this data reveal that information loss occurs frequently, and that the QA pairs give a high-level overview of what information was lost. Second, we devise two methods for this task: end-to-end prompting of open-source and commercial language models, and a natural language inference pipeline. With a novel evaluation framework considering the correctness of QA pairs and their linguistic suitability, our expert evaluation reveals that models struggle to reliably identify information loss and applying similar standards as humans at what constitutes information loss.
Cruise Looks to Solar Panels to Power Self-Driving Cars
Cruise, the San Francisco autonomous car company backed by General Motors, is launching a new initiative to support renewable energy efforts in California's Central Valley. Through a program called Farm to Fleet, Cruise will source solar power for its all-electric fleet from two farms: Sundale Vineyards outside Tulare and Moonlight Companies in Reedley. Sundale Vineyards grows table grapes, and Moonlight is a citrus and stone fruit grower. Both of them also have solar panel installations -- and they'll now support Cruise as it tries to expand the number of electric cars on the road in California. Cruise, the San Francisco autonomous car company owned by General Motors, is paying to source solar power for its all-electric fleet from two farms: Sundale Vineyards outside Tulare and Moonlight Companies in Reedley (Fresno County).
Rural California schools have been open for months. It's taken a learning curve
Tabatha Plew quit her good-paying construction job in August, pulled her kids out of a Central Valley school they loved and moved seven hours north to this tiny town in Trinity County. Like a lot of rural communities, Weaverville in recent years has seen more people leaving than arriving, but it had a golden commodity Plew couldn't find at home in Fresno County for her three children: open classrooms that promised a desk in front of a teacher. "I packed them up, and I told my husband, 'We love you. See you on the weekends,'" said Plew, who moved into her in-laws' home in Weaverville. "This was the highest-paying job I've ever had, and, you know, the money didn't mean anything when my kids were struggling."
Google parent Alphabet has grand global plan to breed disease-carrying mosquitoes out of existence
SAN FRANCISCO – Silicon Valley researchers are attacking flying bloodsuckers in California's Fresno County. A white high-top Mercedes van winds its way through the suburban sprawl and strip malls as a swarm of male Aedes aegypti mosquitoes shoot out of a black plastic tube on the passenger-side window. These pests are tiny and, with a wingspan of just a few millimeters, all but invisible. "You hear that little beating sound?" says Kathleen Parkes, a spokesperson for Verily Life Sciences, a unit of Alphabet. Jacob Crawford, a Verily senior scientist riding with Parkes, begins describing a mosquito-control technique with dazzling potential.
Google's 'mosquito cannon' releases millions of insects with a virus that wipes out the population
Google is making headway on a landmark project that hopes to one day rid the world of disease-carrying mosquitoes that can be a nuisance to some regions and dangerously fatal to others. The'Debug Fresno' project, launched by Google parent company Alphabet's Verily life sciences unit, has been releasing millions of Aedes Aegypti mosquitoes in northern California's Fresno county. Approximately 80,000 of the tiny, engineered mosquitoes, which have a wingspan of just a few millimeters, are set free from a roving van using a'mosquito cannon' after being infected with a bacteria in the hopes of killing off the entire mosquito population in that area. The'Debug Fresno' project, launched by Alphabet's Verily unit, has been releasing tens of thousands of Aedes Aegypti mosquitoes (pictured) in northern California's Fresno county The Aedes aegypti has white markings on its legs and a marking in the form of a lyre on the upper surface of its thorax. The mosquito originated in Africa but is now found in tropical and subtropical regions throughout the world.
Simultaneous 12-Lead Electrocardiogram Synthesis using a Single-Lead ECG Signal: Application to Handheld ECG Devices
Afrin, Kahkashan, Verma, Parikshit, Srivatsa, Sanjay S., Bukkapatnam, Satish T. S.
Recent introduction of wearable single-lead ECG devices of diverse configurations has caught the intrigue of the medical community. While these devices provide a highly affordable support tool for the caregivers for continuous monitoring and to detect acute conditions, such as arrhythmia, their utility for cardiac diagnostics remains limited. This is because clinical diagnosis of many cardiac pathologies is rooted in gleaning patterns from synchronous 12-lead ECG. If synchronous 12-lead signals of clinical quality can be synthesized from these single-lead devices, it can transform cardiac care by substantially reducing the costs and enhancing access to cardiac diagnostics. However, prior attempts to synthesize synchronous 12-lead ECG have not been successful. Vectorcardiography (VCG) analysis suggests that cardiac axis synthesized from earlier attempts deviates significantly from that estimated from 12-lead and/or Frank lead measurements. This work is perhaps the first successful attempt to synthesize clinically equivalent synchronous 12-lead ECG from single-lead ECG. Our method employs a random forest machine learning model that uses a subject's historical 12-lead recordings to estimate the morphology including the actual timing of various ECG events (relative to the measured single-lead ECG) for all 11 missing leads of the subject. Our method was validated on two benchmark datasets as well as paper ECG and AliveCor-Kardia data obtained from the Heart, Artery, and Vein Center of Fresno, California. Results suggest that this approach can synthesize synchronous ECG with accuracies (R2) exceeding 90%. Accurate synthesis of 12-lead ECG from a single-lead device can ultimately enable its wider application and improved point-of-care (POC) diagnostics.